872 research outputs found

    Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling

    Full text link
    Conditional Random Fields (CRFs) constitute a popular and efficient approach for supervised sequence labelling. CRFs can cope with large description spaces and can integrate some form of structural dependency between labels. In this contribution, we address the issue of efficient feature selection for CRFs based on imposing sparsity through an L1 penalty. We first show how sparsity of the parameter set can be exploited to significantly speed up training and labelling. We then introduce coordinate descent parameter update schemes for CRFs with L1 regularization. We finally provide some empirical comparisons of the proposed approach with state-of-the-art CRF training strategies. In particular, it is shown that the proposed approach is able to take profit of the sparsity to speed up processing and hence potentially handle larger dimensional models

    An Operational SSL HF System (MILCOM 2007)

    Get PDF
    8 pagesInternational audienceAbstract- This paper presents an operational HF (3-30MHz) system designed for single site localization (SSL) of transmitters involved in trans horizon radio links. It associates the estimation of the directions of arrival of incident radio waves refracted by the ionosphere with a ray tracing software based on the PRIME model of the channel. The direction finding processing is implemented on an array of non identical sensors that presents a polarization sensitivity. A specific version of the MUSIC algorithm jointly estimates the angles of arrival (azimuth and elevation) of incident waves and their polarization. Statistics of the angles of arrival (mean values and standard deviation) are the input data of a ray tracing software based on the PRIME model of the ionosphere which computes the estimated position of the transmitter. Numerous radio links have been tested for long distances up to 2000 km. A very good agreement is observed between the exact and the estimated positions of the transmitters with a standard localization error being less than 10% of the distance to the receiving system

    Du quatriùme de proportion comme principe inductif : une proposition et son application à l’apprentissage de la morphologie

    Get PDF
    Nous prĂ©sentons un modĂšle d’apprentissage par analogie qui exploite la notion de proportions analogiques formelles ; cette approche prĂ©suppose de savoir donner un sens Ă  ces proportions et de pouvoir implanter efficacement leur calcul. Nous proposons une dĂ©finition algĂ©brique de cette notion, valable pour les structures utilisĂ©es couramment pour les reprĂ©- sentations linguistiques : mots sur un alphabet fini, structures attribut-valeur, arbres Ă©tiquetĂ©s. Nous prĂ©sentons ensuite une application Ă  une tĂąche concrĂšte, consistant Ă  apprendre Ă  ana- lyser morphologiquement des formes orthographiques inconnues. Des rĂ©sultats expĂ©rimentaux sur plusieurs lexiques permettent d’apprĂ©cier la validitĂ© de notre dĂ©marche

    Measuring text readability with machine comprehension: a pilot study

    Get PDF
    International audienceThis article studies the relationship between text readability indice and automatic machine understanding systems. Our hypothesis is that the simpler a text is, the better it should be understood by a machine. We thus expect to a strong correlation between readability levels on the one hand, and performance of automatic reading systems on the other hand. We test this hypothesis with several understanding systems based on language models of varying strengths, measuring this correlation on two corpora of journalistic texts. Our results suggest that this correlation is rather small that existing comprehension systems are far to reproduce the gradual improvement of their performance on texts of decreasing complexity

    Learning the Structure of Variable-Order CRFs: a finite-state perspective

    Get PDF
    The computational complexity of linear-chain Conditional Random Fields (CRFs) makes it difficult to deal with very large label sets and long range dependencies. Such situations are not rare and arise when dealing with morphologically rich languages or joint labelling tasks. We extend here recent proposals to consider variable order CRFs. Using an effective finite-state representation of variable-length dependencies, we propose new ways to perform feature selection at large scale and report experimental results where we outperform strong baselines on a tagging task

    Evaluating Subtitle Segmentation for End-to-end Generation Systems

    Get PDF
    Subtitles appear on screen as short pieces of text, segmented based on formal constraints (length) and syntactic/semantic criteria. Subtitle segmentation can be evaluated with sequence segmentation metrics against a human reference. However, standard segmentation metrics cannot be applied when systems generate outputs different than the reference, e.g. with end-to-end subtitling systems. In this paper, we study ways to conduct reference-based evaluations of segmentation accuracy irrespective of the textual content. We first conduct a systematic analysis of existing metrics for evaluating subtitle segmentation. We then introduce Sigma, a new Subtitle Segmentation Score derived from an approximate upper-bound of BLEU on segmentation boundaries, which allows us to disentangle the effect of good segmentation from text quality. To compare Sigma with existing metrics, we further propose a boundary projection method from imperfect hypotheses to the true reference. Results show that all metrics are able to reward high quality output but for similar outputs system ranking depends on each metric’s sensitivity to error type. Our thorough analyses suggest Sigma is a promising segmentation candidate but its reliability over other segmentation metrics remains to be validated through correlations with human judgements

    BiSync: A Bilingual Editor for Synchronized Monolingual Texts

    Full text link
    In our globalized world, a growing number of situations arise where people are required to communicate in one or several foreign languages. In the case of written communication, users with a good command of a foreign language may find assistance from computer-aided translation (CAT) technologies. These technologies often allow users to access external resources, such as dictionaries, terminologies or bilingual concordancers, thereby interrupting and considerably hindering the writing process. In addition, CAT systems assume that the source sentence is fixed and also restrict the possible changes on the target side. In order to make the writing process smoother, we present BiSync, a bilingual writing assistant that allows users to freely compose text in two languages, while maintaining the two monolingual texts synchronized. We also include additional functionalities, such as the display of alternative prefix translations and paraphrases, which are intended to facilitate the authoring of texts. We detail the model architecture used for synchronization and evaluate the resulting tool, showing that high accuracy can be attained with limited computational resources. The interface and models are publicly available at https://github.com/jmcrego/BiSync and a demonstration video can be watched on YouTube at https://youtu.be/_l-ugDHfNgU .Comment: ACL 2023 System Dem

    Cross-lingual alignment transfer: a chicken-and-egg story?

    Get PDF
    International audienceIn this paper, we challenge a basic assumption of many cross-lingual transfer techniques: the availability of word aligned parallel corpora, and consider ways to accommodate situations in which such resources do not exist. We show experimentally that, here again, weakly supervised cross-lingual learning techniques can prove useful, once adapted to transfer knowledge across pairs of languages

    Reassessing the proper place of man and machine in translation: a pre-translation scenario

    Get PDF
    Traditionally, human--machine interaction to reach an improved machine translation (MT) output takes place ex-post and consists of correcting this output. In this work, we investigate other modes of intervention in the MT process. We propose a Pre-Edition protocol that involves: (a) the detection of MT translation difficulties; (b) the resolution of those difficulties by a human translator, who provides their translations (pre-translation); and (c) the integration of the obtained information prior to the automatic translation. This approach can meet individual interaction preferences of certain translators and can be particularly useful for production environments, where more control over output quality is needed. Early resolution of translation difficulties can prevent downstream errors, thus improving the final translation quality ``for free''. We show that translation difficulty can be reliably predicted for English for various source units. We demonstrate that the pre-translation information can be successfully exploited by an MT system and that the indirect effects are genuine, accounting for around 16% of the total improvement. We also provide a study of the human effort involved in the resolution process
    • 

    corecore